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ABSTRACT. Robert Machol's surprising result, that from a single observation it is possible to 
have finite length confidence intervals for the parameters of location-scale models, is re-produced 
' and extended. Two previously unpublished modifications are included. First, Herbert Robbins non- 

parametric confidence interval is obtained. Second, I introduce a technique for obtaining confidence 
intervals for the scale parameter of finite length in the logarithmic metric. 



1. Introduction 

Let x be an observation from a iV(^,cr 2 ) population with unknown parameters. The fol- 
lowing statement belongs to the folklore of Statistical Science: From a single observation 
x we can not gain information about the variability in the population. Thus, finite length 
\ confidence intervals for /i and/or a are impossible even in principle. 

This is not correct. For example x ± 5 • |x| will cover [i at least 90% of the time and 
(0, 17|x|) will cover a at least 95% of the time. R you don't believe it check it with your 
PC! 

I first heard about this some years ago from Herbert Robbins. According to Robbins, 
this phenomenon was discovered by an electrical engineer in the 60's (Robert Machol IEEE 



>%, . 

Trans. Info. Theor., 1964) but it is still relatively unknown to statisticians. 

I show Machol's idea below. The intervals for /x in the parametric case are due to him. 

The nonparametric improvement is due to Robbins and the intervals on a are mine. 

2. Confidence Intervals for /i, Parametric Case 

Consider the following problem. Given a single observation from a r.v. 

1 X — fJj 

X ~> — • /( ), /ieIR, a > unknown, 

o~ a 

with / a known density symmetric about zero. Find a finite length 100 • (1 — /?)% CI for \x. 
Machol's answer: Consider the event 

A= [\X-n\ >t\X-a\] 
where aeIR is an arbitrary constant and t > 1 is given. We have 

A = [\Y\ > t\Y - a\] 
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Fig. 1. Illustration of event A 



where 



Y = ~» f(y) and a = eIR. 



a a 
The event A corresponds to the shaded piece in Fig. 1. Thus, 



P{A) = P[ \Y\ >t\Y-a\ } 



of 
t+1 



f(y)dy 



P(a,t) 



and 



P(A) < f3* {t) = sup/9(a,t). 



Therefore 

P[ X - t\X - a\ < /I < X + t\X - a\ } = P{A C ) > 1 - (i*{t) 

Hence, provided that (3*{t) — > as t — > oo the interval X ± t|X — a| can be made to have 
any pre-specified confidence. 

Example: Take f(y) = 4>(y) = pdf of N(0, 1). From the symmetry of 4> about zero we 
can write 



(3(-a,t) 



at 
t + 1 



4>(z)dz 



(3(a,t) 



Thus, 



For a > we have, 



da 



so that 



(a,t) 



exp 



(3*{t) = sup/3(a,t). 

a>0 



at 



t-V \t- 1 



1 / at 



t+1 V*+ 1 



at 



0, 



2 V* + l 



1 / at 



2\t + l 



t + 1 
t- 1 
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and taking logs we obtain 

J0^ 2 [{t 2 + 2t + 1) - (t 2 - 2t + 1)] = 2 log (£±1) 

from where 
and 

.(t+l) /^log(|±|) 

/?*(*)=/ V?-^ 4>(v)*v 

with a calculator and a normal table we find that for i = 5 then a* = 1.0796, (5* = .1 and 
the confidence is 90% for x ± 5|x|. Other intervals could be computed in a similar way. In 
fact this shows that 

P[X-5\X-a\< ii < X + 5\X-a\] > .90 

for all aeK, /ieIR and a > 0. 

The best a is the one that produces the shortest expected length. But, length = L = 
2t\X - a\ and 

E(L) = 2tE(\X - a\) oc E(\X - a\) 

so that the best a = a* should minimize E(\X — a\) i.e. a* must be the median of X and 
since X is symmetric about \x we have a* = fi. Hence, the best a is our best a priori guess 
for fj,. This looks like Bayesianism sneaking in classical confidence intervals!. 

The arbitrariness of a in the statement "x ± t\x — a\ is a (1 — /3*(t))100% CI for fi" 
reminds me of the Stein shrinking phenomenon. Perhaps this is part of the reason why 
Robbins got interested in it. Recall that Robbins' Empirical Bayesianism produces Stein's 
estimators as a special case. 

3. Confidence Intervals for jjl, Non-parametric Case 

Let 3 s be the class of all unimodal, symmetric about zero densities. Given a single observa- 
tion of X with X ~> f(x — n) where both fe^s and //eIR are unknown, find a 100(1 — (5)% 
CI for /x of finite length. 

Robbins' Answer: Consider first the following simple lemma: 
Lemma: If / [ in (0, +oo) then 

l(x) = J—[ b f(y)dy | in (0,5) 

X J x 

proof: This is obvious from the picture (see Fig. 2.), since l{x) denotes the mean value of 
/ on (x, b). Of course the algebra gives the same answer. Notice that 

l(x) < ^/(*) (P-x) = f(x). 
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Fig. 2. The mean value of f(y) decreases when x approaches b 

Thus, differentiating both sides of the equation 

(b-x)l(x) = I' f(y)dy, 

Jx 

we obtain 

Hx) = J—[l( x )-f( x )] < 

i.e. l{x) decreases in (0, &)• 
Consider as before the event 

A = [ \X - fj,\ > t\X - a\ ] for t > 1 and aelR. 

Then, if Y = X — fi , we have 

P(A) = P[ \Y\ > t\Y - a\ } with a = a - n eTR. 

P(A) = P(a,t) = (3{-a,t) since /e3. 
But now applying the Lemma for x = at/(t + 1) > and b = at/(t — 1) we obtain 

Hence, 

P(4) < for all aeM and /e9f. 

v ; - t + l J 

Therefore 

P[ X-t\X-a\ < n < X + t\X-a\ ] > 1 1 



1 + t 
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holds for all aeIR, /ieIR, and /e9. 

Example: For t = 9, we have 1 — l/(l+£) = .9, and x±9\x — a\ will cover /j, at least 90% of 

the time even if we are uncertain about feQ. This suggests the following game: Each time 
you pick up a function / in 9 in any way you want i.e. deterministically or stochastically 
with some distribution. Then you choose /xelR also in an arbitrary way i.e. each \x every 
time or following a pre-specified sequence, or generate them with a distribution changing 
the distribution each time etc... Then use the computer to show me x ~> f{x — /x) . I win 
$1 if x ± 9|x| covers your fx and you win $5 if it doesn't. Do you want to play a couple of 
hundred times? 

4. Confidence Intervals for a 

We consider now the estimation of the scale parameter from a single observation. It should 
be noticed that the only interesting confidence intervals are those of finite length. Thus, 
(0, oo) is a 100% confidence interval but useless. 

The natural, invariant under re-parameterizations, measure of length for a confidence 
interval (a, b) for a scale parameter is not just b — a but proportional to the difference in 
the logarithmic scale, i.e. log b — log a. This follows by recalling the fact that the square of 
the element of length, on the hypothesis space of the location-scale model, along a line of 
constant scale is given by: 

ds 2 = g aa (da) 2 
where g aa is the Fisher information amount at a given by: 

- k ~ 1 

with 

y 2 {^'{y)) 2 dy 

-oo 

and ip 2 = f in the notation of the proposition below. Hence, the geodesic distance from 
the probability distribution with scale "a" to the probability distribution with scale "6" is 
obtained by integrating the element of length and therefore proportional to the difference in 
the log scale as noted above. The reader unfamiliar with the geometry of hypothesis spaces 
may use the expression of the Kullback number between the gaussian with mean zero and 
standard deviation "a" and the gaussian with mean zero and standard deviation "6" as an 
approximation to the geodesic distance, to convince him/herself of the logarithmic nature 
of this length. 

It is therefore necessary to consider confidence intervals with non-zero lower bounds, 
since a = is in fact a line at infinity. I show below that it is possible to have finite length 
confidence intervals for the scale parameter from a single observation, but only if we rule 
out a priori from the hypothesis space a bit more than the line a = 0. It is this interplay 
between geometry, classical inference and bayesianism that I find appealing in this problem. 

Proposition: Let / be a pdf symmetric about and differentiable everywhere. Let F 
be the associated cdf. Let < t\ < t 2 < oo with f'(t\) > f'fe) and define 

G(a,ti,t 2 ) = F(a-t l )+F(a + t 2 )-F(a-t 2 )-F(a + t 1 ). 
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Fig. 3. Illustration of event A 



Let M > 0, aeIR, fieJR, o > be given numbers. Then if 

\n-a\ < aM and -f(^—^ 

a \ a 



we have 



\X-a\ \X-a\ 
< a < 



t-2 



> 2[F(t 2 ) - Fit!)} I[M < M*] + 



I[M > M*\ inf {G(a,t 1 ,t 2 )}. 

1 1 0<a<M l 



Where M* = min {a > : G(a, h,t 2 ) = G(0, h , t 2 ) }. If / = iV(0, 1) (or any other pdf 
with similar tails) and excellent approximation is 

M* = t 2 + F-\2F(t 1 ) - 1) 



Proof: Consider the event 



,4 = 



\X-a\ \X-a\ 
< a < 



t 2 



Let 



Y 



X-n 



Then by adding and subtracting /x inside the absolute values and dividing through by a we 
obtain 

A = [h < \Y-a\ < t 2 ] 

where a = (a — fj,)/a is such that \a\ < M. Notice that the y's satisfying the inequalities 
that define the event A correspond to the shaded region in Fig. 3. 
Hence, 

ra—ti ra+t2 

P(A) = / f(y)dy + / f(y)dy = G(a,h,t 2 ) 
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Fig. 4. Illustration of the event A 

Notice that for given values t\ and i 2 the function G, as a function of a is twice differentiable 
and symmetric about zero with a local minimum at a = 0. Since, using the fact that 
f{y) = f{-y) we nave 



da 



a=0 



[f(a-h)- f(a-t 2 ) + f(a + t 2 )- f(a + h)]\ a=0 = 



and also 



d 2 G 



da 2 



= /'(-ti)-/ / (-i 2 ) + / , (t 2 )-/ / (ti) 
= 2(/'(t!)-/'(t 2 )) > 



Thus, 



P(A) > G(0,ti,t 2 ) = 2[F(t 2 ) - F(ti)] 
provided that |a| < M* i.e. if M < M*. The picture (see Fig. 4.) illustrates the situation. 

In the gaussian case, to obtain reasonable confidences we must have t\ < 1 and ti > 3. 
Hence, F(a — h) s=s F(a + ti) ~ F(a) and F(a + t 2 ) ~ 1- From where 

G{aMM) ~ l-P(a-t 2 ) = 2[1-F(ti)] « G(0,*i,t 2 ) 

and the approximation for M* is obtained by solving the central identity for a* 
Remarks: 

1) Notice that the lower bound of the confidence interval, i.e. \x — a|/t 2 , is positive only 
if M < 00 i.e. if we know a priori that \ji — a\ < oM < 00. 

2) When i 2 — > 00 then M* — > 00 and with no prior knowledge ( i.e. \fi — a\ < 00 ) we 
still have 

\X-a\ 



P[0<a< 



^) > 2(1-F(tx)). 



h 

3) The value of t 2 is related to the amount of prior information. The larger i 2 the 
weaker the prior information necessary to assume the desire confidence. On the other hand 
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t\ controls the confidence associated to the interval. These remarks are illustrated with 
examples. 

Examples: Let a; be a single observation from a gaussian with unknown mean [i and 

unknown variance a 2 . Then 90% CIs for a are: 
( , 8 1 x | ) valid always 

(^,8|x|) valid if \n\ < 2.7a 

^^|r,8|x|^ valid if < 6.7a 
95% CIs are: 

(§,17|x|) valid if \fi\ < 3.3a 

(g,17|x|) valid if | fi | < 48a 

(0, 17|x|) valid always. 
99% CIs are: 

(^,70|x|) valid if \n\ < 2.7a 

(ft,70|x|) valid if \fi\ < 997a 
(0, 70|x|) valid always. 



Almost Real Example 

I'll try to show that the required prior knowledge necessary to have non-zero lower bounds 
for the CIs is in fact often available. Suppose that we want to measure the length of the 
desk in my office with a regular meter graduated in centimeters. Let x be the result of a 
single measurement and let fi be the true length of my desk. Then 

x = [i + e with e ^ N(0, a 2 ) 

is a reasonable and very popular assumption. Now, even before I make the measurement I 
can write with all confidence that for my desk fj, = 2 ± lm i.e. \ji — 2\ < 1. With the meter 
graduated in centimeters I will be guessing the middle line between centimeters so I can be 
sure that x = [i± at least \ of a centimeter. Thus, 

1 

3cr > — . 
~ 400 

Therefore I can be absolutely sure that 

\n-2\ < 1200cT. 

Hence, 



\x - 2| 
1500 



■,70\x-2 



will be a 99% CI for a 



